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Abstract 

Traditional control-flow analysis (CFA) for higher-order languages 
introduces spurious connections between callers and callees, and 
different invocations of a function may pollute each other’s re¬ 
turn flows. Recently, three distinct approaches have been published 
that provide perfect call-stack precision in a computable manner: 
CFA2, PDCFA, and AAC. Unfortunately, implementing CFA2 and 
PDCFA requires significant engineering effort. Furthermore, all 
three are computationally expensive. For a monovariant analysis, 
CFA2 is in 0(2"^), PDCFA is in 0(n®), and AAC is in 0(n®). 

In this paper, we describe a new technique that builds on these 
but is both straightforward to implement and computationally in¬ 
expensive. The crucial insight is an unusual state-dependent allo¬ 
cation strategy for the addresses of continuations. Our technique 
imposes only a constant-factor overhead on the underlying analy¬ 
sis and costs only 0{n^) in the monovariant case. We present the 
intuitions behind this development, benchmarks demonstrating its 
efficacy, and a proof of the precision of this analysis. 

Categories and Subject Descriptors D.3.4 [Programming Lan¬ 
guages}'. Processors and Optimization 

Keywords Static analysis; Control-flow analysis; Abstract inter¬ 
pretation; Pushdown analysis; Store-allocated continuations 

1. Introduction 

Recent developments in the static analysis of higher-order lan¬ 
guages make it possible to obtain perfect precision in modeling the 
call stack. This allows calls and returns to be matched up precisely 
and avoids spurious return flows. Consider the following Racket 
code, which binds an identity function and applies it on two dis¬ 
tinct values: 

(let* ([id (lambda (x) x)] 

[y (id #t)] 

[z (id #f)]) 

. . .) 

Without a precise modeling of the call stack, the value #f can 
spuriously flow to the variable y, even when a technique like call 
sensitivity initially keeps them separate. 
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To avoid this imprecision, Vardoulakis and Shivers CSl intro¬ 
duces a context-free approach (as in context-free languages, not 
context sensitivity) to program analysis with CFA2. This tech¬ 
nique provides a computable, although exponential-time, method 
for obtaining perfect stack precision for monovariant analyses of 
continuation-passing-style programs. Two other approaches, PD¬ 
CFA and AAC, build on this work by enabling polyvariant (e.g., 
context sensitive) analysis of direct-style programs and do so at 
only a polynomial-factor increase to the run-time complexity of the 
underlying analysis. 

Earl et al. 0 presents a pushdown control-flow analysis (PD¬ 
CFA), which improves on traditional control-flow analysis by an¬ 
notating edges in the state graph with stack actions (i.e., push and 
pop) that implicitly represent precise call stacks. But, this method 
obtains its precision at a substantial increase in worst-case com¬ 
plexity. For example, a monovariant PDCFA is in 0(n®) where its 
finite-state equivalent is in O(n^). Unfortunately, PDCFA also re¬ 
quires significant machinery and presents challenges to engineers 
responsible for constructing and maintaining such analyses. 

Johnson and Van Horn (Sj presents abstracting abstract control 
(AAC), a refinement of store-allocated continuations with the es¬ 
tablished finite-state method of merging stack frames into the store, 
and defines an allocator that is precise enough to avoid all spurious 
merging. The key advantage of this method is that it is trivial to 
implement in existing analysis frameworks that use store-allocated 
continuations and comes at the cost of changing roughly one line of 
code. Unfortunately, AAC is more computationally complex than 
PDCFA as even in the monovariant case it is in 0(n®). 

We draw on the lessons learned from all three approaches and 
present a technique for obtaining perfect call-stack precision at only 
a constant-factor increase to run-time complexity over traditional 
finite-state analysis {i.e,., for free in terms of complexity) and requir¬ 
ing no refactoring of analyses already using store-allocated contin¬ 
uations (i.e., for free in terms of labor). 

1.1 Contributions 

We contribute an efficient method for obtaining a perfectly precise 
modeling of the call stack in static analyses. Specifically: 

• We present a novel technique for obtaining perfect call-stack 
precision at no asymptotic cost to run-time complexity and 
requiring only a trivial change to analyses already using store- 
allocated continuations. In the monovariant case, our analysis 
is in O(n^), the same complexity class as a traditional 0-CFA. 

• We illustrate the intuition behind our approach and explain why 
previous PTIME methods (PDCFA and AAC) fail to exploit it. 

• We describe our implementation and provide benchmarks that 
demonstrate its efficacy. 

• We define a relationship between our technique and a static 
analysis that uses unbounded stacks and use it to prove the 
precision of our method. 
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1.2 Outline 

Section|^defines a simple direct-style language and its operational 
semantics. It presents the relevant background on abstract inter¬ 
pretation using abstract machines, soundness, store widening, and 
concepts necessary to understanding our technique. We close this 
section by giving a walkthrough of the above example, illustrating 
precisely how values become merged in a traditional analysis. 

In section]^ we formalize an incomputable static analysis that 
defines what is meant by perfect stack precision. This analysis 
loses no precision in its modeling of the call stack but requires 
an infinite number of unbounded stacks to be explored. We then 
review PDCFA and AAC, the existing polynomial-time approaches 
to obtaining an equivalent stack precision. 

In section we formalize our technique, give the intuitions 
that led us to it, and explain how it relates to each of the analyses 
described in section]^ We describe our implementation and present 
both monovariant and call-sensitive allocation benchmark results 
that compare the complexity and precision of our technique to that 
of ACC. 

Section|^provides a formal relationship between the unbounded- 
stack machine of section and our improved finite-state analy¬ 
sis. We use this relationship to prove the perfect precision of our 
method. 


2. Background 

Static analysis by abstract interpretation proves properties of a pro¬ 
gram by running its code through an interpreter powered by an 
abstract semantics that approximates the behavior of a concrete 
semantics. This process is a general method for analyzing pro¬ 
grams and serves applications such as program verification, mal¬ 
ware/vulnerability detection, and compiler optimization, among 
others CHllQol The abstracting abstract machines (AAM) ap¬ 
proach uses abstract interpretation of abstract machines for control- 
flow analysis (CFA) of functional (higher-order) programming lan¬ 
guages |2l[T2l[l5l. The AAM methodology allows a high degree 
of control over how program states are represented and is easy to 
instrument. 

In this section, we review operational semantics and abstract 
interpretation using AAM along with other concepts we will re¬ 
quire as we progress. We present a concrete interpretation of a 
simple direct-style language, a traditional finite-state abstraction, 
and a store-widened polynomial-time analysis. We then explore the 
return-flow merging problem in greater detail. 


2.1 Concrete Semantics 

We will be using the direct-style (call-by-value, untyped) A- 
calculus in administrative-normal-form (ANF) l^. 


e £ Exp (let ([a; (/ te)]) e) 

I « 

/, * £ AExp ::= x \ lam 
lam £ Lam (A (a:) e) 
x,y £ Var is a set of identifiers 


[call] 

[return] 

[atomic expressions] 
[lambda abstractions] 
[variables] 


All intermediate expressions are administratively let-bound, and 
the order of operations is made explicit as a stack of such lets. This 
not only simplifies our semantics, but is convenient for analysis as 
every intermediate expression can naturally be given a unique iden¬ 
tifier. Additional core forms permitting mutation, recursive binding, 
conditional branching, tail calls, and primitive operations add com¬ 
plexity, but do not complicate the technique we aim to discuss and 
so are left out. 


Our concrete interpreter operates over machine states ?. 

? £ E = Exp X Env x Store x Kont [states] 
p £ Env = Var ^ Addr [environments] 

a £ Store = Addr Clo [stores] 

clo £ Clo = Lam X Env [closures] 

K. £ Kont = Frame* [stacks] 

cj) £ Frame = Var x Exp x Env [stack frames] 

a £ Ad.d.r is an infinite set laddressesl 


Binding environments (p) map variables in scope to a representa¬ 
tive address (a). Value stores (a) map these addresses to a program 
value. (For pure A-calculus, all values are closures.) Both are par¬ 
tial functions that are incrementally extended with new points. A 
closure (do) pairs a syntactic lambda with an environment over 
which it is closed. Continuations (k) are unbounded sequences of 
stack frames. Each stack frame (0) contains a variable to bind, an 
expression control returns to, and an environment to reinstate. Ad¬ 
dresses (a) may be drawn from any set which permits us to generate 
an arbitrary number of fresh values (e.g., N). 

We define a helper A : AExp x Env x Store Clo for atomic- 
expression evaluation: 

Al(x, p, cr) = a{p{x)) [variable lookup] 

A{lam, p, a) = (lam, p) [closure creation] 

A concrete transition relation : S ^ S defines the 

operation of this machine by determining at most one successor 
for any given predecessor state. The machine stops when the end 
of a program’s execution is reached or when given an invalid state. 
Call sites transition according to the following transition rule: 

((let ([y (/«)]) e),p, a, k) (e', p', a', <(>: k), where 

= {y,e,p) 

((A (x) e),px) = A{f,p,a) 
p = Pa [a; !->• a] 
a' = a[a i— A{ee, p, cr)] 
a is a. fresh address 

A new frame f is pushed onto the stack for eventually returning 
to the body of this let-form. The atomic expression / is either a 
lambda-form or a variable-reference and is evaluated to a closure 
by our helper A. In our notation, ticks are used to uniquely name 
identifiers that may be different. These do not have any bearing 
on the variable’s domain, but where possible will hint at usage 
(e.g., a single tick for a successor’s components). A subscript may 
be more significant, but we will be careful to point it out. This is 
not the case for p\, which is used to name whatever environment 
was drawn from the closure for /. This is simply an environment 
distinct from p and p'. We generate afresh address a (any address 
such that a ^ dom{a)) and update p\ with a mapping a; i—>■ a to 
produce the successor environment p'. Likewise, the prior store a 
is extended at this address with the value for se to produce a'. 

Return points transition according to a second rule: 

(cB, p, a,(f>:K) (e, p', a', k), where 

<f> = ix,e,p^) 

p = pi^[x^ a] 

a' = a[a i— >■ A{se, p, cr)] 
a is afresh address 

The top stack frame (j> is decomposed and its environment p„ 
extended with a fresh address a to produce p'. Likewise, the store 
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is extended at this address with the value for ae to produce a'. The 
expression e in the top stack frame is reinstated at p', and a' is put 
atop the predecessor’s stack tail k. 

To fully evaluate a program eo using these transition rules, we 
inject it into our state-space using a helper I : Exp —>■ E: 

l{e) = (e,0,0,£) 

We perform the standard lifting of (~^e) to obtain a collecting 
semantics defined over sets of states: 

sG 5 = P(E) 

Our collecting relation ('^s) is a monotonic, total function that 
gives a set including the trivially reachable state T(eo) plus the set 
of all states immediately succeeding those in its input. 

s s' = s' = {?' I ? e s A ? --i!: ?'} u {2:(eo)} 

If the program eo terminates, iteration of (~^s) from _L (i.e., the 
empty set 0) does as well. That is, ('^s)"'(_L) is a fixed point con¬ 
taining eo’s full program trace for some n G N whenever eo is a 
terminating program. No such n is guaranteed to exist in the gen¬ 
eral case (when eo is a non-terminating program) as our language 
(the untyped A-calculus) is Turing-complete, our semantics is fully 
precise, and the state-space we defined is infinite. 

2.2 Abstract Semantics 

We are now ready to design a computable approximation of the ex¬ 
act program trace using an abstract semantics. Previous work has 
explored a wide variety of approaches to systematically abstracting 
a semantics like these (Him Ha. Broadly construed, the nature of 
these changes is to simultaneously finitize the domains of our ma¬ 
chine while introducing non-determinism both into the transition 
relation (multiple successor states may immediately follow a pre¬ 
decessor state) and the store (multiple values may be indicated by a 
single address). We use a finite state space to ensure computability. 
However, to justify that a semantics defined over this finite machine 
is soundly approximating our concrete semantics (for a defined no¬ 
tion of abstraction), we must also modify our finite states so that 
a potentially infinite number of concrete states may abstract to a 
single finite state. We will use this term finite state to differenti¬ 
ate from other kinds of machine states. Components unique to this 
finite-state machine wear tildes: 


? G S = Exp X Env x Store 

[states] 

X KStore x Addr 


p G Env = Var ^ Addr 

[environments] 

d G Store = Addr —>■ D 

[stores] 

de D = v(ao) 

[flow-sets] 

do G Clo = Lam X Env 

[closures] 

d^ G KStore = Addr —>■ K 

[continuation stores] 

ke K = V{Kont) 

[kont-sets] 

k G Kont = Frame x Addr 

[continuations] 

(j> G Frame = Var x Exp x Env 

[stack frame] 

a, 5 k G Addr is a finite set 

[addresses] 


There were two fundamental sources of unboundedness in the con¬ 
crete machine: the value store (with an infinite domain of ad¬ 
dresses), and the current continuation (modeled as an unbounded 
list of stack frames). We bound the value store (ct) by restricting 
its domain to a finite set of addresses (d), but we permit a set of 
abstract closures (do) at each. We finitize the stack similarly by 


threading it through the store as a linked list. A continuation is thus 
represented by an address. This address points to a set of topmost 
frames, each paired with the address of its continuation in turn (i.e., 
that stack’s tail). We separate the continuation store (ct^) from the 
value store (a) to maintain simplicity as we progress. 

Abstract environments (p) change only because our address set 
is now finite. Abstract closures (do) are approximate only by virtue 
of their environments using these abstract addresses. For each such 
a, the finite value store (a) denotes afiow set (d) of closures. At 
each point, a continuation store (ct„) has a set of continuations (k). 
Like closures, each abstract frame (fi) is approximate only by virtue 
of its abstracted environment. An abstract continuation (R) pairs a 
frame with an address (d,^) for the stack underneath. 

As before, we define a helper for abstract atomic evaluation. A'. 

A : AExp X Env x Store D 

A{x, p, d) = d{p{x)) [variable lookup] 

A{lam, p, d) = {{lam, p)} [closure creation] 

Note that atomic evaluation of a lambda expression new yields a 
set containing a single element for the closure of that lambda. 

Because our address domain is now finite, multiple concrete al¬ 
locations need to be represented by a single abstract address. There 
are a variety of sound strategies for doing this. Each strategy cor¬ 
responds to a distinct style of analysis and is amenable to easy im¬ 
plementation by defining an auxiliary alloc helper to encapsulate 
these differences in behavior. Given the variable for which to al¬ 
locate and the finite state performing the allocation, the abstract 
allocator returns an address: 

alloc : Var x E —>■ Addr 

One such behavior is to simply return the variable itself (as a 0-CFA 
would): 

alloco{x, q) = X 

Using alloco would tune our finite-state semantics to the monovari¬ 
ant analysis style (also called zeroth-order CFA), a form of context- 
insensitive analysis. In a monovariant analysis, every closure that is 
bound to a variable x at any point during a concrete execution ends 
up being represented in a single flow set when the analysis is com¬ 
plete. 

Because we are also now store-allocating continuations and dis¬ 
tinguishing a top-level continuation store, we likewise distinguish 
an abstract allocator specifically for addresses in this store: 

alloc^ : E x Exp x Env x Store —>■ Addr 

A standard choice is to allocate based on the target expression: 

allocKo{{e, p, d, 5-„;, d„), e', p, d') = e 

We provide to this function all the information known about the 
transition being made. The value-store allocator is invoked before 
a successor p' or d' is constructed. However, when calling the 
continuation-store allocator, we provide information about the tar¬ 
get state being transitioned to. The choice of e' for allocating a 
continuation address makes sense considering the entry point of a 
function should know where it is returning. In fact, when perform¬ 
ing an analysis of a continuation-passing-style (CPS) language, e' 
also would naturally be the choice inherited from a monovariant 
value-store allocator (assuming an alpha-renaming such that every 
X is unique to a single binding point). 
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We may now define a non-deterministic finite-state transition 
relation C S x E. Call sites transition as follows. 

((let ([y (/ te)]) e),p, CT,5-K,a«;) (e', p', ci', a(,), where 

((A (x) e),p\) € Aif,p,a) 
p = px[x^ a] 
ct' = 5- U [a !->■ A{se, p, a)] 
a = alloc{x, ?) 

d-K = CTk U \a^ ^ {((y, e, p), a*)}] 
a(. = alloc^{q,e , p ,a) 

As A yields a set of abstract closures for /, a successor state is 
produced for each. Likewise, so each point in the store accumulates 
all closures bound at that abstract address a and so we faithfully 
over-approximate all the addresses a that a simulates, we use a 
join operation when extending the store. The join of two stores 
distributes point-wise as follows. 

aU a = Xa. (j(a) U (j' {a) 

iT„ U 5-(. = Xolk- o-K(afc) U 

Instead of generating a fresh address for a, we use our abstract al¬ 
location policy to select one. To instantiate a monovariant analysis 
like 0-CFA, this address is simply the syntactic variable x. Like¬ 
wise, we generate an address for our continuation (a new stack 
frame atop the current continuation) and extend the continuation 
store. 

The return transition is modified in the same way: 

(te, p, a, CTk, a^) (e, p ,a\ a^), where 

{{x,e,pn),a^) e o^iaAj 

p = pK,[x^ a] 

= O' U [5 I— >■ A{ee, p, d)] 
a = alloc{x, d) 

Where multiple topmost stack frames are pointed to by clk, this 
transition yields multiple successors. An updated environment and 
store are produced as before, but the continuation store remains as 
it was. The current continuation d'^ reinstated in each successor is 
the address associated with each topmost stack frame. 

To approximately evaluate a program according to these abstract 
semantics, we first define an abstract injection function, I, where 
the stores begin as functions, _L, that map every abstract address to 
the empty set. 

i : Exp —> E 
i{e) = (e, 0, _L, _L, dhait) 

The address dhait can be any otherwise unused address that is never 
returned by the allocation function. Our machine will eventually be 
unable to transition into this continuation and will then produce no 
successors, which simulates the behavior of our concrete machine 
upon reaching an empty stack (e). 

We again lift to obtain a collecting semantics de¬ 

fined over sets of states: 

sG S = V(E) 

S s' = s' = {?' I ? G s A ? ?'} U {T(eo)} 


Our collecting relation is a monotonic, total function that 

gives a set including the trivially reachable finite-state X(eo) plus 
the set of all states immediately succeeding those in its input. 

Because E is now finite, we know the approximate evaluation of 
even a non-terminating eo will terminate. That is, for some n G N, 
the value ('^s)"'(_L) is guaranteed to be a fixed point containing an 
approximation of eo’s full program trace 03. 

2.3 Soundness 

An analysis is sound if the information it provides about a pro¬ 
gram represents an accurate bound on the behavior of all possi¬ 
ble concrete executions. The kind of control-flow information the 
finite-state analysis in section |2(2] obtains is a conservative over¬ 
approximation of program behavior. It places an upper bound on 
the propagation of closures though a program. 

To establish such a relationship between a concrete and abstract 
semantics, we use Galois connections. A Galois connection is a 
pair of functions for abstraction and concretization such that the 
following holds. 

a-.S^S j:S^S 
a{s) C s s C 7 (s) 

Using this defined notion of simulation, we may show that our 
abstract semantics approximates the concrete semantics by proving 
that simulation is preserved across transition: 

q:(s) C s a s s' =t> s s' A a(s') C s' 
Diagrammatically this is: 


i“ -i“ 


Both constructing analyses using Galois connections and proving 
them sound using Galois connections has been extensively ex¬ 
plored in the literature GHIBl. The analysis style we constructed 
in sectio n |2.2| has been previously proven sound using the above 
method cir 

2.4 Store Widening 

Various forms of widening and further approximations may be 
layered on top of this naive analysis. One such approximation is 
store widening, which is necessary for our analysis to be tractable 
(i.e., polynomial time). To see why store widening is necessary, 
let us consider the complexity of an analysis using ('^s). The 
height of the power-set lattice {S, U, n) is the number of elements 
in E which is the product of expressions, environments, stores, 
and addresses. A standard worklist algorithm at most does work 
proportional to the number of states it can discover [H. For the 
imprecise allocators we have defined, analysis run-time is thus in: 

_ \ Store] \KStore\ _ __ 

I Exp I \Env\ \Addr\ 

OC'rT' X ''rT' X 2" X 2" x ^"n^) 

The number of syntactic points in an input program is in 0(n). In 
the monovariant case, environments map variables to themselves 
and are isomorphic to the sets of free variables that may be deter¬ 
mined for each syntactic point. The number of addresses produced 
by our monovariant allocators is in 0(n) as these are either syn¬ 
tactic variables or expressions. The number of value stores may 
be visualized as a table of possible mappings from every address 
to every abstract closure—each may be included in a given store 
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Figure 1. The value space of stores. 


or not as seen in figure [T] The number of abstract closures is in 
0 (n) because lambdas uniquely determine a monovariant environ¬ 
ment. This times the number of addresses gives O(n^) possible ad¬ 
ditions to the value store. The number of continuations is likewise 
in 0{n) because let-forms uniquely determine their binding vari¬ 
able, body, and monovariant environment. This times the number 
of possible addresses gives 0{n^) possible additions to the contin¬ 
uation store. 

The crux of the issue is that, in exploring a naive state-space 
(where each state is specific to a whole store), we may explore both 
sides of every diamond in the store lattices. All combinations of 
possible bindings in a store may need to be explored, including 
every alternate path up the store lattice. For example, along one 
explored path we might extend an address fii with c/oi before 
extending it with c/ 02 , and along another path we might add these 
closures in the reverse order (i.e., c /02 before c/oi). We might also 
extend another address 0.2 with c/oi either before or after either 
of these cases, and so forth. This potential for exponential blow¬ 
up is unavoidable without further widening or coarser structural 
abstraction. 

Global-store widening is an essential technique for combating 
exponential blow up. This lifts the store alongside a set of reachable 
states instead of nesting them inside states in E. To formalize this, 
we define new widened state spaces that pair a set of reachable 
configurations (states sans stores) with a global value store and 
global continuation store. Instead of accumulating whole stores, 
and thereby all possible sequences of additions within such stores, 
the analysis strictly accumulates new values in the store in the same 
way ('^s) accumulates reachable states in an s: 


^ G S. = R X Store x KStore 

fGR^V{C) 

c G C = Exp X Env x Addr 


[state-spaces] 

[reachable configurations] 
[configurations] 


A widened transfer function may then be defined that, like 

(-^s), is a monotonic, total function we may iterate to a fixed point. 


: H -i> H 


This may be defined in terms of (~^e), as was by transition¬ 

ing each reachable configuration using the global store to yield a 
new set of reachable configurations and a set of stores whose least 
upper bound is the new global store: 

(f, a, CTk) -w' {r, 5 ', a^), where 


s = {? I (e, p, a^) G rA (e, p, d, d^,d^) ?} U {2:(eo)} 

r = {{e,p,a^) \ {e,p,d,5^,d^) G 5} 

?'= U U 


In this definition, an underscore (wildcard) matches anything. The 
height of the R lattice is linear (as environments are monovariant) 
and the height of the store lattices are quadratic (as each global 
store is strictly extended). Each extension of the store may require 
0 (n) transitions because at any given store, we must transition 
every configuration to be sure to obtain any changes to the store 
or otherwise reach a fixed point. A traditional worklist algorithm 
for computing a fixed point is thus cubic: 

1^1 \Store\ \KStore\ 

X {v? + )) 

2.5 Stack Imprecision 

To illustrate the effect of an imprecise stack on data-flow and 
control-flow precision, we first define a more precise 1 -call- 
sensitive (first-order, 1-CFA) allocator. A fc-call-sensitive analysis 
style differentiates bindings to a variable so they are unique to a 
history of the last k call sites reached before the binding. A his¬ 
tory of length k = 1 then allocates an address unique to the call 
site immediately preceeding the binding by using the following 
allocator. 

alloci {x, (e, p, d, d^, k)) = {x, e) 

Now, using alloci, consider the following snippet of code where 
the variable id is already bound to (A (x)°x): 

... ^(let ([y (id #t)]) 

^(let ([z (id #f)] ) 

"...)) 

We number these expressions for ease of reference. For example, 62 
refers to the let-form that binds z, and eo to the return point of id. 
We assume the starting configuration for this example is (ei, p, 5^) 
where p and 5„ are the binding environment and continuation ad¬ 
dress at the start of this code. We likewise let p\ be the environment 
of id’s closure. 

The first call to id transitions to evaluate eo with the con¬ 
tinuation address eo. This transition reaches the configuration 
(eo, Pa[x !->■ (x, ei)], eo) and binds (x, ei) to #t and the continua¬ 
tion address eo to the continuation ((y, eo, p), a„), which gives us 
the following stores: 

d = {(x, ei) i-5> {#t}} 

d^ = {eo i-s-{((y,e2,p),a«)}} 

Next, id returns and transitions from eo to eo, extending the con¬ 
tinuation’s environment to p[y 1 —>■ (y,eo)] and reinstating the 
continuation address Sk. This yields a configuration (e 2 ,p[y 1 —> 
(y, eo)], Sk). This transition binds (y, eo) to #t, giving us the fol¬ 
lowing stores: 

d = {(x, ei) {#t}, 

(y,eo) 

5-k = {eo i-s-{((y,e 2 ,p),a«)}} 

Then the second call to id transitions to evaluate eo withjhe^con- 
tinuation address eo once again (recall the definition of alloci^o). 
This transition reaches the configuration (eo, Pa[x i—( x, e 2 )], eo), 
binding (x, eo) to #f and the continuation address eo to the contin¬ 
uation ((z, 63 , p[y i-> (y, eo)]), Sk), giving us the following stores: 

d = {(x, ei) i-> {#t}, 

(y,eo) 

(x,e 2 ) 

5-k = {eo e^{{{y,e 2 ,p),d^), 

((z,e3,p[y 1 -^ (y,eo)]),aK)}} 
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Next, id returns and transitions from eo to 63 , reinstating the con¬ 
tinuation address 5^ and extending the continuation’s environment 
to p[y I— > (y, eo)][z I— >■ (z, eo)]. Because eo is bound to two contin¬ 
uations, this transition binds (z, eo) to #f while another spuriously 
binds (y, eo) to #f, causing return-flow imprecision in the follow¬ 
ing stores: 

a = {(x, ei) 

(x, 62) 

(y,eo) 

(z,eo) 

CTk = {eo i-)-{((y, e2,p),a„), 

((z,e 3 ,p[y 1-^ (y,eo)]),aK)}} 


have seen so far, we call this an unbounded-stack machine. Com¬ 
ponents unique to this machine wear hats: 


? G E = Exp X Env x Store x Kont 

[states] 

p G Env = Var ^ Addr 

[environments] 

d G Store = Addr —^ D 

[stores] 

de D = T’(C^) 

[flow-sets] 

clo G Clo = Lam X Env 

[closures] 

k G Kont = Frame 

[whole stacks] 

G Frame = Var x Exp x Env 

[stack frames] 

d G Addr is a finite set 

[addresses] 


Our atomic-expression evaluator works just as before: 


The address (y, eo), representing y within es, maps to both #t and 
#f, even though no concrete execution binds y to #f. A similar 
pair of transitions from (eo,PA[x 1 —>■ (x, ei)],eo) (the second of 
which is prompted by a change in the global continuation store at 
the address eo) cause the same conflation for z. 

Clearly, one solution is to increase the context sensitivity of our 
continuatiqnjillocator. Consider a continuation allocator allocKi 
that like alloci uses a single call site of context and allocates 
a continuation address (e^ e) formed from both the expression 
being transitioned to, e', and the expresson being transitioned from, 
e. This results in no spurious merging at return points because 
continuations are kept as distinct as the 1 -call-sensitive value-store 
addresses we allocate. 

It seems reasonable from here to suspect that perfect stack 
precision could always be obtained through a sufficiently precise 
strategy for polyvariant continuation allocation. The difficulty is 
in knowing how to obtain this in the general case given an arbi¬ 
trary value-store allocation strategy. Given that CFA2 and PDCFA 
promise a fixed method for implementing perfect stack precision, 
albeit at significant engineering and run-time costs, can perfect 
stack precision be implemented as a fixed, adaptive continuation 
allocator? In this paper, we both answer this question in the affirma¬ 
tive and show that this leads us not only to a trivial implementation 
but to only a constant-factor increase in run-time complexity. 


3. Perfect Stack Precision 

We next formalize what is meant by a static analysis with perfect 
stack precision by using an abstract abstract machine (AAM) da 
with unbounded stacks within each machine configuration. We then 
review the existing polynomial-time methods for computing an 
analysis with equivalent precision to this machine: PDCFA and 
AAC. 

3.1 Unbounded-Stack Analysis 

In the same manner as previous work on this topic, we formalize 
perfect stack precision using a static analysis that leaves the struc¬ 
ture of stacks fully unabstracted. Each frame of this unbounded 
stack is itself abstract because its environment is abstract and ref¬ 
erences the abstracted value store. States and configurations, how¬ 
ever, directly contain lists of such frames that are unbounded in 
length. Environments, closures, stack frames, flow sets, and value 
stores are otherwise abstracted in the same manner as the finite ma¬ 
chine of section |2)2l To differentiate this from the machines we 


A : AExp X Env x Store D 
A{x, p, d) = d{p{x)) [variable lookup] 

A{lam, p, d) = {{lam, p)} [closure creation] 

As does a monovariant allocator: 

alloc : Var x E —>■ Addr 
alloco{x, ‘C) = X 

This may be tuned to any other allocation strategy as easily as 
before. 

We now define a non-deterministic unbounded-stack-machine 
transition relation (-^e) C E x E and a rule for call-site transitions: 

s 

((let ([y (/ £b)]) e),p,d,k) (e', p, d', k), where 

^ = {y,e,p) 

((A (x) e'),p\) G A{f,p,d) 
p = Pa [a: !->■ a] 
d' = dVA [a A{x, p, d)] 
d = alloc{x, ?) 

This is slightly simplified from its analogue in {'^f). The defini¬ 
tions of e^ p^ and d' are effectively identical, but the continua¬ 
tion store and continuation address have been replaced with an un¬ 
bounded stack (j ): k. 

Likewise, the return transition also changes to the following. 


{se, p, d,(j>:k) (e, p , d', k), where 

= {x,e,pn) 
p = p„ [a; !-)■ a] 
d' = d\A[d\-^ A{se, p, d)] 
d = alloc{x, ?) 

To follow a return transition, the stack must contain at least one 
frame. Then the appropriate e is reinstated with the environment p 
extended with an address for x. The store is extended and whatever 
stack tail existed after is the successor’s continuation k. 
Unbounded-state injection is defined as we would expect: 

X : Exp —>■ E 

i(e) = (e, 0 ,_L,e) 
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As before, we lift to obtain a monotonic naive collecting rela¬ 
tion {~^s) for 3 program eo that is defined over sets of unbounded- 
states: 


sG S = V{t) 


s s' 4 s' = {?' I ? G s A ? <f'} U {i(eo)} 

This analysis is approximate but remains incomputable because 
the stack can grow without bound. Put another way, the height of 
the lattice {S, U, n) is infinite and so no finite number of ('^s)- 
iterations is guaranteed to obtain a fixed point. 

3.2 Store-Widened Unbounded-Stack Analysis 

As we will be comparing this unbounded-stack analysis to our new 
technique using precise store-allocated continuations, we derive a 
global-store-widened version as before: 

^ G H = i? X Store [state-spaces] 

f £ R = V{C) [reachable configs.] 

c £ C = Exp X Env x Kont [configurations] 

A widened transfer function (-^j) is defined in terms of (^e) in 
exactly the same manner as {~^l) was derived from (~^e) except 
that we now have only a single global value store and no continua¬ 
tion store: 




{f,a) (f'j (t'), where 


these machines are closely related: 

g £ G =V X E X St^ 
v£V = r{Q) 
q £ Q = Exp X Env 
e £ E = V{Q X Frame± x Q) 
(j)± £ Frame± = Frame x {push, pop} 


[Dyke graph] 
[Dyke vertices] 
[Dyke configs.] 
[Dyke edges] 
[edge actions] 


For readability, we style an edge (q, (0, push), g') G e like so: 


q £ e 


It would be too verbose to formalize all the machinery required 
to compute a valid Dyke state graph. Instead, we define it from a 
completed unbounded-stack analysis The function ESQ : H —> 
G produces a Dyke state graph from a fixed-point ^ for (^s). 
The graph g — ESQ{^) is a valid Dyke state graph analysis for 
a program eo when ^ is the unbounded-stack analysis of eo. 
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ESQ{{f, a)) = {v, e, a), where 


« = {(e.P) I {e,P,K) £ f} 

e = {(e.P)(e'.P ) I {e,p,k)£f 

A {e,p,a,k) ->e {e , p k)} 

U {(e,/3)(e',/3') | {e,p,k)£r 

A {e,p,a,^-.k) (e',p,a,k)} 


s = {^ \ (e, p, k) £ r A (e, p, a, k) ?} U {2:(eo)} 
r = {{e,p,k) I {e,p,a,k) £ s} 
d'= □ a" 

3.3 Pushdown Control-Flow Analysis (PDCFA) 

Pushdown control-flow analysis (PDCFA) is a strategy for creating 
a computable equivalent to the precision of our unbounded-stack 
machine at a quadratic-factor increase to the complexity class of the 
underlying finite analysis (e.g., monovariant or 1-call-sensitive) 0. 
This strategy tracks both reachable states (or in the store-widened 
case, configurations) as well as push or pop edges between them. A 
quadratic blow up comes from the fact that each pair of reachable 
states may have an explicitly-tracked edge between them. These 
edges implicitly represent, as possible paths through the graph, 
the stacks explicitly represented in the unbounded-stack machine. 
This graph precisely describes the regular expression of all stacks 
reachable in the pushdown states of the unbounded-stack analysis. 

PDCFA formalizes a Dyke state graph for this. Where a se¬ 
quence of pushes may be repeated ad infinitum, a Dyke state graph 
explicitly represents a cycle of push edges and a cycle of pop edges 
finitely. Broadly speaking, this is also how AAC and our adaptive 
continuation allocator work, except that such cycles are represented 
in the store instead of the state graph. A Dyke state graph is a state 
transition graph where each edge is annotated with either a frame 
push, a frame pop, or an epsilon. The set of continuations for a par¬ 
ticular state in a Dyke state graph is determined by the pushes and 
pops along the paths that reach that state. 

To formalize these Dyke state graphs, we reuse some compo¬ 
nents of our unbounded-stack machine, continuing to use hats as 


Although we do not formalize transition relations for Dyke 
state graphs themselves, it will be helpful for us to illustrate the 
major source of additional complexity in engineering a PDCFA 
directly. In the finite-state analysis, a transition is able to trivially 
compute a set of stacks by looking up the current continuation 
address in the continuation store. In the unbounded-stack analysis, 
a transition is able to trivially compute the stack by looking at the 
final component of the state or configuration being transitioned. 
In a Dyke state graph, canceling sequences of pushes and pops 
may place the set of topmost stack frames on edges arbitrarily 
distant from the configuration q being transitioned. In this way, 
the implicitness of stacks in a Dyke state graph obfuscates one of 
the most common operations needed to compute the analysis (i.e., 
stack introspection). As an example, observe how the topmost stack 
frame fie for gs is located elsewhere in the graph: 

go -^ <?1 -^ <72 -^ <73 

PDCFA therefore requires a non-trivial algorithm for stack in¬ 
trospection (5) and extra analysis machinery overall. Specifically, 
PDCFA requires the inductive maintenance of an epsilon closure 
graph in addition to the Dyke state graph as seen in the following. 


<70-^ <7i -^ <72-^ <73 

This structure makes all sequences of canceling stack actions ex¬ 
plicit as an epsilon edge. As we will see, this epsilon closure graph 
represents unnecessary additional complexity for both computer 
and analysis developer. 
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3.4 Precise Allocation of Continuations (AAC) 

Abstracting abstract control (AAC) lH is another polynomial-time 
method for obtaining perfect stack precision. This technique works 
hy store-allocating continuations using addresses unique enough 
to ensure no spurious merging and, like PDCFA, does not require 
foreknowledge of the polyvariance (e.g., context sensitivity) be¬ 
ing used in the value store. The method is worse than PDCFA’s 
quadratic-factor increase in run-time complexity. In the monovari¬ 
ant and store-widened case, its authors believe it to be in 0 (n*) fTl . 
However, AAC makes perfect stack precision available/or/ree in 
terms of development cost (i.e., labor). 

Given the standard finite-state abstraction we built up in sec- 
tion |2.2[ we can define AAC’s essential strategy in a single line: 

alloc^AAciie, p, a, k), e , p , a ) = (e', p, e, p, a) 

That is, continuations are stored at an address unique to the target 
state’s expression e' and environment p' as well as the source state’s 
expression e, environment p, and store a. 

We have simplified AAC slightly and translated its notation 
to give this definition in the terms of our framework. A more 
faithful presentation of AAC shows fundamental differences be¬ 
tween their framework and ours. AAC uses an eval-apply se¬ 
mantics and explodes each flow set into a set of distinct states 
across every application. The exact address AAC proposes using is 
(((A (t/) e'),px),do,a) (Figure7in jgl) where ((A (y) e'),px) 
is the target closure of an application, do is one particular abstract 
closure flowing to y, and a is the value store in the source state. 
Our components e' and p' are isomorphic to the target closure in 
the sense that e' is identical and p' is produced from the combi¬ 
nation of px and y. The source state’s components (e, p, a) are 
not as specific as do and d, but they do uniquely determine a flow 
set d (the result of A invoked on /) that contains do. However, 
a semantics using an eval-apply factoring like AAC is needed to 
obtain a unique continuation address for every closure propagated 
across an application. This would have significantly complicated 
our presentation of the finite-state analysis, and in section we 
will see that being specific to do adds run-time complexity to an 
analysis without adding any precision. 

The intuition for AAC is that by allocating continuations spe¬ 
cific to both the source state and target state of a call-site transition, 
no merging may occur when returning according to this (transition- 
specific) continuation-address. If we were to add some arbitrary 
additional context sensitivity (e.g., 3-call-sensitivity), this informa¬ 
tion would be encoded in p' and inherited by alloc^AAc upon pro¬ 
ducing an address. Including this target-state binding environment 
in continuation addresses is the key reason why AAC allocates pre¬ 
cise continuation addresses. 

In section 1^ we will see that only the target state’s expression 
e' and environment p' are truly necessary for obtaining the perfect 
stack precision of our unbounded-stack machine. Including com¬ 
ponents of the transition’s source state, its store, or its flow set only 
adds run-time complexity that is unnecessary for achieving perfect 
stack precision. This optimization extends AAC’s core insight to be 
computationally/or/ree while remaining precise and developmen- 
tally/or/ree. 


4. Perfect Stack Precision for Free 

The primary intuition of our work can be illustrated by considering 
a set of intraprocedural configurations for some function invocation 
as in the following with co through cs. 


\ _ _ _ _ / 
Co -> Cl-> C 2 -> C3-^ C4 -> Cs 



The configuration co represents the entry point to the function, and 
its incoming edge is a call-site transition. The configuration cs rep¬ 
resents an exit point for the function, and its outgoing edge is a 
return-point transition. A transition where one intraprocedural con¬ 
figuration follows another, like co —^ ci, is not technically possi¬ 
ble in our restricted ANF language but in more general languages 
would be. The function’s body may call other functions / and g 
whose configurations are not a part of the same intraprocedural set 
of nodes. The primary insight behind our technique is that a set 
of intraprocedural configurations (like cq through cs) necessarily 
share the exact same set of genuine continuations (in this example, 
the incoming call-sites for cq). 

We call the set of configurations cq through 65 an intraprocedu¬ 
ral group because they are those configurations that represent the 
body of a function for a single abstract invocation—defined by an 
entry point unique to some e and p. Our central insight is to notice 
that this idea of an intraprocedural group also corresponds to those 
configurations that share a single set of continuations. Our finite- 
state machine represents this set of continuations with a continu¬ 
ation address, so if this continuation address is precise enough to 
uniquely determine an intraprocedural group’s entry point (e and 
p), then it can be used for all configurations in that same group. 
Thus our allocator may be defined as simply: 

a/tocKP4F((e, p, d, 0 ^, 0 ,^), e', p, a) = (e', p) 

The impact of this change is easily missed, belied by its simplic¬ 
ity. We allocate a continuation based only on the expression and 
environment at the entry point of each intraprocedural sequence of 
let-forms and it is precisely reinstated when each of the calls in 
these let-forms return. 

Recall that the monovariant continuation allocator in our exam¬ 
ple from section [23] resulted in return-flow merging because a sin¬ 
gle continuation address was being used for transitions to multiple 
entry points of different intraprocedural groups. More generally, 
return-flow merging occurs in a finite-state analysis when, at some 
return-point configuration (se, pse, a^), the set of continuations for 

is less precise than the set of source configurations that tran¬ 
sition to the entry point (e, p) of the same intraprocedural group. 
Because we allocate a continuation address specific to this exact 
entry point, and because that address is propagated by shallowly 
copying it to each return point for the same intraprocedural group, 
the set of continuations will be as precise as the set of source con¬ 
figurations transitioning to the same entry point in all cases. This 
means the return-flow merging problem cannot occur when using 
allocKtAv and neither is there a run-time overhead for stack intro¬ 
spection. 

In section]^ we formalize these intuitions and provide a proof 
that our unbounded-stack analysis simulates (i.e., is no more pre¬ 
cise than) a finite-state analysis when using allocKPAp. 

4.1 Complexity 

To see why this allocation scheme leads to only a constant-factor 
overhead, consider a set of configurations co, ci, • • • , c„ that form 
an intraprocedural group and a set of call sites transitioning to co 
with the continuations ko,ki,- ■ ■ ,km-i. We can diagrammati- 
cally visualize this as the following. 



Note that, for each call site, there is a corresponding return flow 
using the same continuation. Our allocation strategy means that 
all of the configurations co, Ci, • • • , c„ use the same continuation 
address (e, p). The global continuation store then maps this address 
to the set {ko> Hi, ''' , 

Now consider what must be done if a new call site transitions 
to cq. First, the continuation store must be extended to contain the 
continuation for this new call site, say krn, in the continuation set at 
the address (e, p). Then the corresponding return edge transitions 
must be added. Note that none of co, ci, • • • , c„_i need to be 
modified or accessed. The only work done here beyond that of the 
underlying analysis is the extension of the continuation store by 
adding km at (e, p) and the addition of a corresponding return edge 
at return points. Thus, the additional work is a constant factor of the 
number of times a continuation is added to the continuation store. 

A naive analysis might lead us to conclude that this is bounded 
by the product of the number of continuation addresses and the 
number of continuations. However, there is a tighter bound. Each 
transition adds only one continuation to the continuation store. 
Thus the work done is a constant factor of the number of transitions 
in the underlying analysis. 

Note that this differs from AAC, which may make duplicate 
copies of the continuation set for an intraprocedural group as it pro¬ 
duces one for each combination of components e, p, and a drawn 
from the source states transitioning to it. As a consequence, AAC 
allocates addresses strictly more unique than the target {e', p') con¬ 
figuration. Two different source expressions eo and ei may both 
have transitions to (e',p'), but AAC will produce two different 
target configurations Cq and ci because the continuation addresses 
they allocate will be distinct. This difference is maintained through 
the two variants of the function starting at e' with environment p', 
and when an exit point te is reached for each, the expression and 
its environment are the same and propagate the same values to two 
sets of continuations. Thus, these continuation addresses and the 
sets of stacks they represent are kept separate without any benefit. 

PDCFA, on the other hand, is more complex for an entirely dif¬ 
ferent reason: the epsilon closure graph. Without the epsilon clo¬ 
sure graph, PDCFA has no way to efficiently determine a topmost 
stack frame at each return transition. Both our method and AAC’s 
method make this trivial by propagating an address explicitly to 
each state. While our method allows a continuation address to be 
shallowly propagated across each intraprocedural node in a func¬ 
tion, the epsilon closure graph recomputes a separate set of incom¬ 
ing epsilon edges for every node. This means that the number of 
such edges for a given entry point (e, p) is the number of callers 
times the number of intraprocedural nodes. This is a quadratic 
blow up from the number of nodes in a finite-state model. This 
is why monovariant store-widened PDCFA is in 0(n®) instead of 
in 0(n®) like traditional 0-CFA. We are able to naturally exploit 
our insight that each intraprocedural node following an entry point 
(e, p) shares the same set of continuations (i.e., the same epsilon 
edges) by propagating a pointer to this set instead of rebuilding 
it for each node. PDCFA is unable to exploit this insight without 
adding machinery to propagate only a shallow copy of an incoming 
epsilon edge set intraprocedurally. It is likely that this insight could 


also be imported into the PDCFA style of analysis to yield a vari¬ 
ant of PDCFA that incurs only a constant-factor overhead, but this 
would require additional machinery. 


4.2 Constant Overhead Requires Store Widening 

That no function can have two entry points that lead to the same 
exit point is a genuine restriction worth discussing further. If this 
were not true, our technique would be precise (assuming multiple 
entry points are not merged), but it would not necessarily be a 
constant-factor increase in complexity. The combination of no store 
widening (per-state value stores) and mutation is a good example 
of how this situation could arise. 

To see how per-state stores can cause a further blow up in 
complexity, consider a function that is called with two different 
continuations and two different stores. Without store widening, 
each store causes a ditferent state to be created for the entry point 
(e, p) of the function. In the following diagram for example, ?i is 
the state for the entry point with one store and <;[ the state for the 
entry point with another store. 



Now suppose that along both sequences of states there is a call to 
some function / and that / contains a side effect that causes the 
previously different value stores to become equal. For example, in 
Cl perhaps the address for a: maps to {#t} and in c( it maps to {#f }. 
If X becomes bound to {#t, #f } along both paths in the body of /, 
the stores along both paths would become identical. 

A problem now arises. Should / return only to one state using 
this common store such as cs or should it return to two different 
states (with identical stores) such as both cs and C 3 ? Either choice 
has drawbacks. The semantics we have given would naturally yield 
the latter option, producing two distinct states that differ only by 
their continuation addresses (their original entry point). Because 
these states are otherwise identical, splitting iti and K 2 into sets 
represented by two different continuation addresses results in ad¬ 
ditional transitions and complexity without any benefit. Arguably, 
these continuation sets should be merged and represented by a sin¬ 
gle address. This corresponds to the former option and could save 
on run-time complexity but only at the cost of additional analy¬ 
sis machinery. This means per-state stores are incompatible with 
our goal of obtaining perfect stack precision/or/ree in both senses 
(running time and human labor). 

4.3 Implementation 

We have implemented both our technique and AAC’s technique for 
analysis of a simplified Scheme intermediate language. This lan¬ 
guage extends Exp with a variety of additional core forms including 
conditionals, mutation, recursive binding, tail calls, and a library of 
primitive operations. Our implementation was written in Scala and 
executed using Scala 2.11 for OSX on an Intel Core i5 (1.3 GHz) 
with 4GB of RAM. It is built upon the implementation of Earl et 
al. O, which implements both traditional fe-CEA and PDCEA. The 
test cases we ran came from the Larceny R 6 RS benchmark suite 
(ack, cpstak, tak) and examples compiled from the previous litera- 
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ture on obtaining perfect stack precision (mj09, eta, kcfa2, kcfa3, 
blur, loop2, sat). As a sanity check, we have verified that both AAC 
and our method produce results of equivalent precision in every 
case. We ran each comparison using both a monovariant value-store 
allocator (figure]^, and a 1-call-sensitive polyvariant allocator (fig¬ 
ure]^. Across the hoard, our method requires visiting strictly fewer 
machine configurations. In some of these cases the difference is 
rather small, but in others it is significant. We saw as much as a 
16.0X improvement in the monovariant analysis and as much as 
a 10.4x improvement in the context-sensitive analysis. The mean 
speedup in terms of states visited was 5.4 x and 4.9 x in the mono¬ 
variant and context-sensitive analyses, respectively. 

5. Proof of Precision 

Proving soundness is fairly straight forward as discussed in sec- 
tion |2.3[ hut proving precision poses a greater challenge. To do this, 
we first define a simulation relation (^h) where ^ ^ (read as 

simulates ^”) if and only if all stored values and machine con¬ 
figurations in ^ (including stacks implicit in this configuration) are 
accounted for in the unbounded-stack representation Usually, the 
next step in such a proof would be to show that taking parallel steps 
preserves precision as in fallacy[T] 

Fallacy 1 (Steps preserve precision). If^ U and ^ then 

i i implies □= i'. 

However, fallacy [T] is not true. This is because after some finite 
number of steps ^ may contain a cycle in its continuation store. This 
means that an infinite family of successively longer stacks must 
also be in ^ for precision to hold. After a finite number of steps, 
however, all stacks in ^ are bounded by a finite length. Hence, there 
are stacks that precision says should be in ^ that are not. 

We thus take a different approach to proving precision. Before 
going into the details, the high-level overview of this proof is as 
follows. Instead of stepping both ^ and ^ in parallel, we show that 
successive steps of ^ are all precise relative to any ^ that is already 
at a fixed point (i.e., theorem [TT] found at the end of this section). 
To show this, we need two inductions. One is over the steps taken 
hy and the other is over the stacks implied by To separate these 
inductions, we define a well-formedness property (wf in figure|7]l 
that we can show is preserved by iterative steps from an initial 


(co,e) ^4 


Lemma[^ 


ie,p,a^) € r 
■ip (via CTk) 


Lemma0 


Lemma[To] 




Figure 4. The logical chain proving lemma[^ Assumes (r, a) is 
at a fixed point and (f, d, Ok) is well-formed. 


(lemma 1^ and for which we can show that any well-formed f is 
precise relative to any ^ that is at a fixed point (lemmas [^and|lo|. 

The well-formedness property is defined in terms of two addi¬ 
tional concepts. First, we formally define the stacks, ip, implied by 
a continuation address, d^, and continuation store, cr„, in terms of 
a relation p) d^ (via CTk) that we define. Then, we define paths, 
(‘4) and (‘->), through ^ and ^ in terms of a sequence of state steps, 
and between states represented by configurations in ^ 

and This allows us to prove the precision of any well-formed ^ 
(i.e., lemma [TO^ through a logical chain informally shown in fig¬ 
ure]^ In lemma|^ we show that for any configuration (e, p, 5„) 
in the f of a ^ and any ip implied by a„ with the continuation store 
CTk of there exists a path from the initial configuration co with an 
empty stack e to the configuration (e, p, d^) with the implied stack 
Ip. In lemma]^ we then show that there exists a corresponding path 
in ^ from cq to (e, HK„ntiip)). Finally, in lemmaj^ we show 

that the endpoint of that path is in ^ and thus the set of reachable 
configurations in ^ is precise relative to ^ (lemma [I^. 

5.1 Definitions and Assnmptions 

In order to prove precision, we first require that the address spaces 
for both a and d correspond as follows. 

Assumption 2 (Address equivalence). There exists an equiva¬ 
lence {=Addr) between finite-state-machine addresses (Addr) and 
unbounded-stack-machine addresses (Addr) that can be decom¬ 
posed into a bijection Addr ^ Addr. 

HAddT 
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((e, p, ^4 ((e, p, a^), ip) (via (f, a, a^), {r , a, aP)), where 

(e, p, Ok) G r Ip (via aP) 


((e, p, a^), Ip) 4- {{e, p^,a^), ip') (via (f, 5-, 5-^), (f', ct, S-;!,)), where 
Pk = 4[x !->■ aUoc{x, (a;, p",a-,a-„,a"))] 

{se,p",aP)£f {e,p^,a^)Gr ((x, e', p„), 4) £ 4(a") 

((e, p, Ok), i/i) 4 ((«, p", a"), ((a;, e', p«), a'„) ; -0') (via (f, a, a„), (r, a, a'^)) 

C 

(a;, p ,a,a^,a^) [e ,p^,a ,a^,a^) 


((e, p, a„), i/i) 4 ((e', p , 4), ((*, e", p"), 4') : •>P') (via (f, ct, a^), {r , a', a'^)), where 
((e, p, a^), Ip) e r ((x, e", p"),aP) e a'^(a^) 

((e,p,a^),ip) 4 (((let ([a; (/a;)]) e"), p", o")4') (via (r,a,a^), {r,a',a'^)) 
((let ([a; (/ a;)]) e"), p", ct, iTk, a") (e', p , 5-', 4,4) 

Figure 5. Finite-state paths 


From this we can build up equivalences (=£„„, =(7i«) and 

precision relations (□=, □s,„„ ^d, 4siore) for all the components 

of our machine. In addition, we can dehne conversion bijections 

(.Heuv^ F/prame, TFramei Hcio^ Tcto^ FFfl, Td^ 4iore) for mOSt 

but not all of the components. These relations have the following 
signatures. 


(□4 C H X H 
(□s,„.) C Store X Store 
(□s) C R X R X KStore 
(=£„„) C Env X Env 
(=f„„) C Erame x Frame 
(44 ^DxD 
(=cio) 4 Clo X Clo 


[state-space precision] 

[store precision] 

[reachable configs. precision] 
[env. equivalence] 

[frame equivalence] 

[flow-set precision] 

[closure equivalence] 


In addition, with the following assumption, we require that the 
value allocators respect the address correspondence. 


Assumption 3 (Allocation equivalence). If p =£„„ p, o 4stor. a, 
and Ip Gp (via a^,), then: 

alloc(x, (e, p, a, HEont(ip))) = Addr alloc(x, (e, p, o, ai^.a^)) 

This assumption uses (Gp) and which deal with the stacks 
implied by an address and continuation store. We define an implied 
stack as an unbounded list of finite-state continuations k\ 

4 G ^ = Kont [implied stack] 


These ip are an intermediate representation in that, like k, their 
structure is unbounded, but each element is taken directly from the 
finite-state machine. We define a binary relation (Gp) that specifies 
which Ip are implied by an 2^ in 2^. This has the following base 
case and inductive case: 


e Gp 2hait (via 2^) 


(e, p, k) 4 (e, p, k) (via f, 2) 


(e, p, k) 4 (e', p ,k') (via r, 2), where 
(e, p, k) 4 (e", p", k") (via r, 2) 

(e ,p , cr, Av j ^2 (e , p , <7, At j 


Figure 6. Unbounded-stack paths 


(4 24 G 2^(2^) A 4 4 (via 2„) A 2 k 4 4ait 

=> ((0, 2k) : Ip) Gp 2k (via 2 k) 

Then given such a ip, we can directly construct its equivalent 
unbounded stack: 

HxAe) = e 2 k) : ip) = He,.A^) : 

Also, given a finite-state configuration c and an implicit stack ip, 
we can construct an unbounded-stack configuration c: 

Hc{[e,p,ax),ip) = (e, HeAp), HxAip)) 

Next, in figures]^ and [^, we define paths to configurations. For 
(4) this is defined by a base case from a configuration to itself and 
a recursive case that builds onto an existing path with a step within 
4 This uses a variation of the step relation defined in figure]^ that 
allows the output store of the step to be a sub-store of the store in 

The (■—^) relation is similar but adds extra side conditions that 
ensure invariants used in our proof. 

Then, in figure]^ we define well-formedness. This is a binary 
predicate with the first argument ^ being the predecessor of the 
second argument Pf, which is the result we say is well-formed. 
This predicate is defined in terms of several sub-properties. The 
ui/| property requires ^ be well-formed and the predecessor of 4- 

The wf^ property requires that If be component-wise greater 
than or equal to The and properties respectively 
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A«)/f(l,l') A A 

i') = (! = !' = ({(eo, 0, ahait)}, -L, -L)) 

V (II'A 3|". «;/(|'M)) 

Wc((?,5-,5-;^), (r',5-',5-^)) = (r c r) A (ct C a') A {a^ C 5^) 

w'/mitd. (r, 5-', 5-^)) = (eo, 0, fflhait) € r 

M'/haitdi (r, 5-', 5-(;)) = 'ik.k^ 5-(,(ahait) 

"'/fd: ir,a,a'^)) = 

V(e, p, a„) € r'. 3^^ Gp a„ (via a(s). 

((eo, 0, ahait), e) ^4 ((e, p, a^),'ip) (via (f', ct', a(,)) 

Va. Vcio G a'(a). 3(e, p, a^) G f . 3(e', p , a^) G r'. 

c __ 

(e,4S',a„;,a„)(e',p,a',a),,a(,) (witha, do) 

Wa„ ((^, s, a„), (r, a', a),)) = 
va4 v((*,e,p,^),a^) g a(,(a(;). 

3e4 3p4 = (e^ p),) A Entry (a^) G r' 

A 3/. 3;e. ( (let ([a: (/ te)]) e),pn,a^) G r 
A ((let ([a: (/ te)]) e), p„, a, a,,,, a„) 

C 

(e^pdS-'.SdaK) (withal ((a:, e,p„,),aK)) 


Entry : —>■ C 

Entryia’a^i) = (eo,0, ahait) 

Entry{{e^, p4) = (e„;, p„, {e^,p^)) 

Figure 7. Well-formedness properties 

require that the initial configuration be in and that the halt- 
continuation address Ohait not have any continuations associated 
with it. Finally, wff, wfg., and ensure that everything in the 
f, a, and a^ for has a reason to be there. For u;/-, this means 
that every element of r has some path leading to it. For wf^ and 
ui/g.^, this means that every value stored in a or a^ has some step 
(~^4 that put it there. These are defined in terms of the variations 

of (~^4 ttt figurej^that have side conditions about the contents of 
the store. 

For wfg^, we define Entry, which maps a continuation ad¬ 
dress to the configuration that is the entry point for the function 
invocation that contains the configurations using that continuation 
address. 

Finally, with the following assumption, we require that once 
allocation creates an address it must always produce the same 
address for the same configuration even if the value or continuation 
stores have changed. 


(e, p, a, k) {e , p , a , «'), where 

(e, p, (T, Kj [e , p ,(j , K ) 

a \— a 

c 

(e, p, a, CTk, a„;) (e', p, a , S'!),««), where 

(^6, O’ , O"^ , j (^6,^,0" , (7;^ , J 

(a" C O') A (d" C a,,) A (a"' C a ) A (d"' C d(,) 

C __ 

(e, p, d, OkjCIk) (e , P , d , (with d, do), where 

(6, p, G , , Gft^ j "^2 (e , p , G , G^ , G^ j 

(d” C d) A (d" C dre) A (d"' C d') A (d”' C d(,) A cZo G o'”{a) 

C 

(e, p, d, d„;, d„) (e', p, d', d)), a(,) (with k, a"), where 

(e, P, G , G^ , Gft^j "^2 (e , P , G , G^ , G^ j 
(d" c d) A (d" C d„) A (d'" C d') A (a'” C d(,) A K G d"'(d(,') 


Figure 8. Sub-step Relations 

Assumption 4 (Allocation consistency). If wf{^, (f, d, d„)), and 
f/ie ^fafe itep (e, p, d, d^, d^) -^e (4,p^[a; i—t d], d", d)), d(i) 
holds where a = alloc{x, (e, p, d, d^, d^)) and tdere w a result 
step (f, d, ds) {f', o', d(,), fden fde corresponding allocation 
for e, p, and dn, but with a and is the same: 

alloc{x, (e, p, d, d^, d,f)) = alloc{x, (e, p, d', d(,, d„)) 

5.2 Lemmas and Theorems 

To start, lemma|^shows that iterated steps produce well-formed f. 

Lemma 5 (Well-formedness of analysis results). If^' is the result 
of taking zero or more steps of{~^f), starting from the initial result, 
({(eo, 0, d;„/,)}, _L, _L), then wf{i,i') for some i. 

Proof. We induct over (~^4 steps. In the base case, we can easily 
show that the initial result is well-formed. We can also show that 
for any f' f", if wf{l, f') then wf{i', i”). This is done using 
sublemmas for the components of well-formedness. We omit these 
for space. □ 

Next, with lemma we show that every configuration paired 
with one of its implied stacks has a path leading to it (i.e., the top 
edge of figure]^. 

Lemma 6 (Stacks have paths). If (e, p, d„) G f such that 
(t^, 5',d„)), then: 

V4 Gp d„ (viao^). 

((eo, 0, CLHait), e) ((e, p, d«), -f) (via (r, d, d„;)) 

Proof. By wf {r,d, o-k)), there is some tp' Gp d^ (via d„) 
for which ((eo, 0, dhait), e) ((e, p, d^), ^') (via |, (r, d, d^)). 
However, this path uses t/i' instead of our desired ip. Thus we induct 
over Ip. If tp is the empty list, e, then d„ must be dhait and thus 
e is the only tp for which ip Gp d^ (via df). So ip' = tp = e, 
and the path obtained from (r, d, d^)) equals our desired 

conclusion. 


12 






If ip is {{x, Ck, Pk),3-'k) ■ 4’" for some x, Cn, Pk, ip", then 
there is a path for ip' from Entry{aK,) to (6,^,5^). By another 
induction there is a similar path for ip\ 

{Entryia^), ip) ^ ((e, p, a^),ip) (via (f, d, a^)) 

By wff{^,{r,a-,5K.)), there exist / and se for a call site 
((let ([x (/ £e)]) Bn) ,pK,a'K) £ f and a step from that call 
site to Entry{5.^)' 

C 

((let ([x (/ te)]) e„),p„,o-,(5-„,a(j) -w(; (e', p', d, 5-^, a„) 

where {e , p , 0 ,^,) = Entry {5,^.) 

By the induction hypothesis, we have a path for ip" from (eo, 0, dhait) 
to the call site: 

((eo,0,Ohait),£) (((let ([x (/*)]) Bn) ,pK,a^),ip") 

We now have a path from ((eo, 0, dhait), e) to the call site with ip", 
a step from the call site to Entryia^) that pushes (x, Bk,Pk) onto 
the stack, and a path from Entry{af^) to (e, p, Ok) with ip. From 
these we can then construct the path desired in our conclusion. □ 

Next, with lemma|^ we show that every path in a well-formed 
^ has a corresponding path in any ^ that is at at fixed point (i.e., the 
left edge of figure]^. 

Lemma 7 (Path conversion). If (e, p, d^) € f such that 

iiifif, (f, d, d^)) and (f, d) (f, d), then: 

((eo,0,dM,),e) ^4 ((B,p,a,^),ip) (vial (r,d,a„)) 

(eo,0,e) A-F/'c((e,p, dK),4) (via r, d) 

Proof. By induction over the finite-state path. We have three cases. 
Case: The path is empty. Trivial. 

Case: The last step of the path is a return. For some se, p', d^, 

d4 and p", there is a step (ae, p', d, d^, d(j) (e, p, d, 0 ^, 0 ,^) 

and, by the induction hypothesis, a path: 

(eo, 0, e) Hc{{t£, p, a^), ((x, e, p"), d^) : fj) (via f, d) 

where p = p [x 1 —>■ ai(oc(x, (*, p , d, d^, d„))] 

We can then show that (f, d) contains a step corresponding to the 
step in (f, d, d^): 

(te, He„,{p), d, (x, e, ITa.„(p")) : Heo,„{iP)) 

C ^ 

(e,iTa..(p),d, iT,,„„,(4)) 

Combining this with the path from the induction hypothesis, we 
can then construct the path in our conclusion. 

Case: The last step of the path is a call. For some x, y, /, ae, b', 
p', Pa, d^. Ip', we have (y, e, p\) £ A(f, p', d), and a step: 

((let ([x (/ te)]) e'),p',d,dK,d'^) (b, p, d, d^, a,.,), where 

p = Pa[® alloc(x, ((let ([x (/ te)]) b),p , d,d,^,d(,))] 
and, by the induction hypothesis, a path: 

(eo,0,e) Hc(((let ([x (/ te)]) B),p,a,.,),ip') (via f, d) 

We can then show that (f, d) contains a step corresponding to the 
step in (f, d, d^): 

((let ([x (/*)]) B'),HE,.„(p),d,HE„M')) 

(e, 7TE„.,(p),d, (x,e',iTf,„„(p')) : Hk„u{iP')) 

Combining this with the path from the induction hypothesis, we 
can then construct the path in our conclusion. □ 


Then, with lemma[^ we show that the endpoint of any path in I 
is in I (i.e., the bottom edge of figure]^. 

Lemma 8 (Path endpoint). If(r,d) {r,d), then for any path 
Co ^ (e, p, it) (via r, d), we have: (e, p, it) G f. 

Proof. Trivial. By induction. □ 

Finally, with lemmas and [T^ we show that precision is pre¬ 
served by the step relation (i.e., the right edge of figure]^. 
Then in theorem [TT] we show that these are all precise, which is 
ultimately what we want to prove. 

Lemma 9 (Preservation of precision for value stores). If{f, d) 

(f, d), wf(l, (r, d, d«)), (r, d, dA) (r', d', d))), and (r, d) □= 
(f, d, dre), then d d^ 

Proof. Omitted for space. □ 

Lemma 10 (Preservation of precision for reachable configura¬ 
tions). If (r,d) -t, (f,d), wf{l,(r,d,dE,)), (f,d,dAj 
(f', d', d(,), and (r, d) □= (r, d, d„;), then r r'. 

Proof. If we unfold the definition of (4^), we must show that for all 
(e, p, k) £ r' and ip Gp dn (via d(,), that (e, He„(p), Hk„„{'P)) £ 
f. By lemma 1^ we have: 

((eo, 0, dhait), e) ((e, p, d^), t^) (via |, (r, d, d„;)) 

From this, by lemma|7] we have: 

(eo, 0, e) (e, He„,(p), He.^,{iP)) (via f, d) 

Finally, by lemma[^ we have our conclusion. □ 

Theorem 11 (Precision of analysis results). If^ is the result of tak¬ 
ing zero or more steps of(-^l), starting from ({(eo, 0, dhait)}, -L, -L), 
and ^ then I □= (,■ 

Proof. By induction over the number of steps, trivial simplifica¬ 
tions, unfoldings, and lemmas|^|^ and|lQ| □ 

6. Conclusion 

Traditional control-flow analysis has long suffered from return- 
flow conflation of values, even when context sensitivity and re¬ 
lated techniques keep these values separate across function calls. 
Recent approaches have made significant progress in addressing 
this problem. However, each suffers from serious drawbacks. PD- 
CFA incurs a substantial development cost and causes a quadratic- 
factor increase in run-time complexity. AAC is trivial to implement, 
but incurs an even worse increase in run-time complexity. Our ap¬ 
proach, however, both is simple to implement and adds no asymp¬ 
totic cost to run-time complexity. To accomplish this, we synthesize 
the lessons learned from PDCFA and AAC to show that the ideal 
continuation address is simply a function’s polyvariant entry point: 
its expression and abstract binding environment. This introspection 
on entry points and the corresponding choice of continuation ad¬ 
dress yields a finite-state analysis whose call transitions are pre¬ 
cisely matched with return transitions at no cost to either run-time 
or development-time overhead. 
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